Entry Name:  "UKON-Seebacher-MC2"

VAST Challenge 2017
Mini-Challenge 2

 

 

Team Members:

Daniel Seebacher, University of Konstanz,  daniel.seebacher@uni.kn     PRIMARY
Bruno Schneider, University of Konstanz bruno.schneider@uni.kn

Michael Behrisch, Harvard University, behrisch@g.harvard.edu

 

Student Team:  NO

 

Tools Used:

KNIME,

Tableau,

Java + Piccolo2d,

Javascript (Angular, D3.js, …)

 

Approximately how many hours were spent working on this submission in total?

Michael: 10 Days 5 Hours

Daniel: ~40-50 hours.

Bruno: ~40-50 hours.

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2017 is complete? YES

 

Video

Due to sickness, only part 1 on this video is uploaded.

 vast-mc2-2017.wmv

 

 

Questions

MC2.1 – Characterize the sensors’ performance and operation.  Are they all working properly at all times?  Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.

To examine, if there are errors or incorrect measurements in the sensor data, we created a matrix-based visualization of each monitor for the readings of all four chemical. The resulting overview is shown in Figure 1. Each rectangle in the matrix represents one timestamp and these rectangles are chronologically layouted row-by-row. The color is mapped to the value using min-max normalization and range from light gray (0) to dark blue (max. measured value for a chemical). Magenta is used to indicate missing values. There are individual missing values for Appluimonia and Chlorodinine, which could be the result isolated sensor malfunctions. However, there are hundreds of missing values for Methylosmolene, in all monitors, which cannot be the result of isolated sensor malfunctions.

monitoring_valueColoring

Figure 1: Matrix-based visualization of each monitor for each reading. Each rectangle in these matrix visualizations represents one timestamp and are chronologically layouted row-by-row.

Switching the color scale from a value-based mapping shows really unexpected results. We use a new color scale, counting the number of readings per timestamp. Usually we would expect, that we have one reading for each chemical for each timestamp, e.g.

Monitor

Timestamp

Chemical

Reading

1

4/1/16 0:00

AGOC-3A

0,722303

1

4/1/16 0:00

Appluimonia

0,130435

1

4/1/16 0:00

Chlorodinine

1,25917

1

4/1/16 0:00

Methylosmolene

2,63064

 

“Monitor 1, 4/1/16 0:00, AGOC-3A, 2.68382”. However, there are exactly 214 occurences, where we observe multiple readings of one chemical for a single timestamp, e.g two readings for AGOC-3A in the following example.

Monitor

Timestamp

Chemical

Reading

5

4/1/16 16:00

AGOC-3A

0,682529

5

4/1/16 16:00

Appluimonia

0,0176918

5

4/1/16 16:00

Chlorodinine

0,263674

5

4/1/16 16:00

AGOC-3A

6,36965

We changed the color scale to show these patterns as shown in Figure 2. We can immediately see, that for each missing value of Methylosmolene, we have an additional reading for the chemical AGOC-3A. Since we can observe this behavior over all available monitors, we can most likely rule out individual sensor malfunctions, and form our hypothesis, that these readings for the dangerous chemical Methylosmolene were changed to the harmless AGOC-3A chemical!

monitoring_countColoring

Figure 2: Using the number of readings per chemical per timestamp as a new color scale, we immediately see that for each missing Methylosmolene value, we have an additional reading of AGOC-3A.

Additionally, we found a very interesting pattern in the readings of monitor 4. In Figure 3 we see, that the readings for the values Appluimonia and Chlorodine increase drastically with each passing month. Indicating, that either the sensor has a malfunction regarding these two chemicals, or that the output of those chemicals is indeed increasing. However, since this pattern only occurs for monitor 4, we assume that this is a sensor malfunction.

 

weird_measurements_appl_chlo_monitor4

Figure 3: Interesting pattern in the readings for the chemical Appluimonia and Chlorodine for sensor 4. We see that over the course of April, August and December, that the values of the readings for Appluimonia and Chlorodine increase drastically each month.

Another pattern we observed concerns the wind direction and speed. In contrast to the sensor readings, we don’t have hourly, but readings every three hours. We used linear interpolation in order to fill this gap. However, there is a large gap starting at 8/1/16 0:00 until 8/4/16 17:00 as shown by the magenta rows in Figure 4.

 

monitoring_wind

Figure 4: Large magenta rows indicate a large gap in the data starting from 8/1/16 0:00 until 8/4/16 17:00

MC2.2 – Now turn your attention to the chemicals themselves.  Which chemicals are being detected by the sensor group?  What patterns of chemical releases do you see, as being reported in the data?

Limit your response to no more than 6 images and 500 words.

We first preprocessed the data to include the anomalies measured for the chemical Methylosmolene. To validate our hypothesis that some of the anomalies were instances of retrievable patterns we build an overview over all anomalies. For each anomaly, we used a glyph representation, which shows time series matrices for the anomaly date plus/minus three days as shown in Figure 6.

Figure 6: Closeup of one MatrixFlower Glyph showing four distinct time series (TS) matrices for one monitor and one specific anomaly date. TS matrices represent in the cells every possible interval of the time series. The x and y axis (always the adjacent leg and the opposite leg [math]) depict all possible start, respectively end dates. The cells on the diagonal show the actual values of the TS. The intervals are represented by their means. The four TS matrices are arranged in increasing 90 degree angles. Here we see for the top left TS matrix that the Wind direction was in the beginning unstable (bottom left of the TS matrix), then gradually changed only slightly over the course of the 3 days. The bottom left TS matrix shows the Methylosmolene. Here an outstanding rectangle represents a single TS burst shortly before the anomaly date.

 

Figure 7 Overview over all 214 anomalies with a “Matrix Flower Glyph” representing the four values WindDirection (Top Left), WindSpeed (Top Right), Methylosmolene (Bottom Left) and ACOG-3A (Bottom Right). One can see several patterns of similar behavior in the view.

 

Figure 8: Similar Wind Patterns and overall low values for the main chemicals except for one burst in the readings time series.

 

Figure 9: Unspecific Wind Patterns lead often to homogenous reading time series.

 

Figure 10: Specific strong bursts in Methylosmolene are often “interrupted” and followed by very low/normal readings. Outstanding light rectangle in the lower right time series matrix.

Since these patterns appear to be quite characteristic we are experimenting with an automatic retrieval of similar TimeSeries Matrices. For this purpose, we are calculating a feature descriptor (JCD [JCDDescriptor]) for each of the TimeSeries Matrix. This compositedescriptor combines two “subfeature” descriptors: CEDD and FCTH and combines color and texture information. We are using the Tanimoto Coefficient for our similarity calculation.

Figure 11: Image-based similarity calculation for retrieving similar TimeSeries Matrices. A connection line depicts the similarity value: A (light) grey value and alpha shows dissimilar items and an outstanding red, opaque shows a very similar TSMatrix.

 

In order to approach the question which value for the ACOG-3A is the correct one we use an anomaly component to show all alternatives (either first or second value is correct)

FireShot Capture 1 - MatrixFlower - http___localhost_4200_anomalydetail

Figure 12: Anomaly Detail View. In a comparative view we can examine the time series behavior for either only the first (AGOC-3A1) of the reading values for AGOC-3A or the second (AGOC-3A2). Here we see that likely AGOC-3A1 was used to defer the sensor values for Methylosmolene.

MC2.3Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Limit your response to no more than 8 images and 1000 words.

To find out which factories are responsible for which chemical releases, we follow the information-seeking mantra. We start with an overview visualization of all chemical readings, for all monitors, for each timestamp as shown in Figure 13. In this visualization, we can see different eye-catching patterns of chemical releases, which we will investigate to find the probable polluters. Use-Case 1 (red), Use-Case 2 (green), Use-Case 3 (purple), Use-Case 4 (blue) and Use-Case 5 (yellow). For these use-cases we show how by extending our application to incorporate the wind direction and speed, we can identify the most probable causers of the pollution.

 

overview - highlighted

Figure 13: Overivew visualization showing all chemical readings of all monitors for each timestamp. Highlighted are different use-cases, which we will investigate further. Use-Case 1 (red), Use-Case 2 (purple), Use-Case 3 (blue), Use-Case 4 (green) and Use-Case 5 (yellow).

Use-Case 1 (red):

 

Here we take a closer look at the readings for the chemical Chlorodinine of monitor 6, which exhibits the most peaks in the readings. By zooming in and extending the visualization to show the wind direction and speed, we get the resulting visualization as shown in Figure 14. The arrow direction indicates where the wind is originating from and the arrow length indicates the wind speed. We can see that at times where there is very high Chlorodinine reading, that the wind is always originating from west-south-west. By placing these arrows on the map, we can immediately see the source of pollution as shown in Figure 15. Our investigation shows, that the most probable causer of the pollution of the chemical Chlorodinine is Kasios.

 

zoom_chlor_monitor6

Figure 14: High-detail view of the readings for the chemical Chlorodinine of Monitor 6. Arrow direction indicates the origin of wind and arrow length indicates wind speed.

source_of_pollution

Figure 15: Source of pollution of the chemical Chlorodinine measured at monitor 6. We can see that Kasios is the most probable pollutant.

Use-Case 2 (green):

 

We see a stark increase in the pollutant Appluimonia and Chlorodinine over each for monitor 4 and a very irregular pattern of measurements for monitor 3 in Figure 16. However, a closer look at the data shows that there is no consistent pattern in high-readings and wind direction. This indicates that the chemical readings are either faulty, or that the pollution originates from somewhere else.

 

mon4_applmon3_appl

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Use-Case 3 (purple):

 

This might be the most interesting reading, since how we showed in MC2.2, that there is a consistent manipulation of the Methylosmolene reading to show up as AGOC-3A readings. Since Methylosmolene is a very dangerous chemical, and AGOC-3A is considered harmless, this makes sense. However, if we compare the wind direction at the times where we have no values at monitor 6, we can see that for each missing value, the wind is originating from the east, showing that either Kasios or Roadrunner is responsible for the pollution of Methylosmolene, but we can’t determine which or if one of them changed the Methylosmolene reading to show up as AGOC-3A readings.

 

meth_mon6

 

 

Use-Case 4 (blue):

The readings for the chemical Appluimonia from monitor 9 show, that the high readings we observed, if the wind originates from north, i.e. from inside the nature preserve. This indicates, that the source is not one of the companies, since no company is located north of monitor 9.

 

appl_mon9

 

Use-Case 5 (yellow):

 

For the chemical AGOC-3A we see that if we have high readings of the chemical AGOC-3A that these originate from east. Again indicating that the companies Roadrunner or Kasios are the cause of this pollutant.

 

agoc_mon6

 

 

Conclusion:

 

During our examination of the we found out, that there is a lot of circumstantial evidence, that indicates that Kasios is the main polluter for the chemical Chlorodinine. Additionally, we found out that Kasios and Roadrunner are most likely polluters of the chemical Methylosmolene and AGOC-3A. We see a very interesting pattern for the chemicals Chlorodinine and Appluimonia for monitors 3 and 4. However, there we can’t determine a source of pollution since there is no clear pattern in the wind direction. Finally, we saw that there is clear pollution of the chemical Appluimonia measured at monitor 9, but the wind is originating from north, i.e. inside the nature preserve. This indicates that there is an additional source of pollution, which is not one of the companies.

 

REFERENCES:

[JCDDescriptor] Zagoris, Konstantinos, et al. "Automatic image annotation and retrieval using the joint composite descriptor." Informatics (PCI), 2010 14th Panhellenic Conference on. IEEE, 2010.